A Large-Vocabulary Continuous Speech Recognition Algorithm and its Application to a Multi-Modal Telephone Directory Assistance System
نویسندگان
چکیده
1. I N T R O D U C T I O N One of the main problems with very-large-vocabulary continuous speech recognition is how to accurately and efficiently reduce the search space without pruning the correct candidate. Our speech recognition system is based on the HMM-LR algorithm [1] which utilizes a generalized LR parser [2] as a language model and hidden Markov models (HMMs) as phoneme models. Applying this algorithm to large-vocabulary continuous speech requires: (1) accurate scoring for phoneme sequences, (2) reduction of trellis calculation, and (3) efficient pruning of phoneme sequence candidates. For the first requirement, several speech recognition algor i thms that calculate the backward trelhs likelihood from the end of the utterance, as well as the forward trellis likehhood, have been proposed [3][41 . We also use forward and backward trellis likelihoods for accurate scoring. For the second requirement, we use an adjust ing window, which chooses only the probable par t of the trellis according to the predicted phoneme. For the third requirement, we use an algorithm for merging candidates which have the same allophonle phoneme sequences and the same context-free grammar states [5]. In addition, candidates are also merged at the meaning level [6]. Speech HMMs are sensitive to incoming noise and this often results in a large decrease in the recognition. One solution is to train HMMs on noisy speech to obtain the corresponding op t imum HMMs. For large-vocabulary continuous speech recognition, however, the computat ion load of this solution becomes too high, because all the HMMs need to be retrained each t ime the characterist ics of the background noise (such as its level) change. Taking inspiration from I-IMM decomposition [7], we proposed an HMM-composition technique to easily adap t the speech recognition system based on clean-speech HMMs to background noise [8]. This technique is similar to the technique of Nolasco Flores et al. [9] which was investigated independently. Providing access to directory information via spoken names and addresses is an interesting and useful application of large-vocabulary continuous speech recognition technology in telecommunication networks. Although many systems based on recognizing spoken spelled names are being investigated, it is unreasonable to expect users to correctly spell the names of the persons whose telephone number they want. In addition, there are several sets of letters having similar pronunciations, such as the English E-rhyming letters, and pronunciation of the spelling of another person's names is often unstable, since this is not a familiar action for people. Therefore, it is not easy to correctly recognize alphabetically spelled names, and a more successful approach might be to recognize naturally spoken names, even if the machine has to recognize hundreds of thousand names. We applied our speech recognition technology to a directory assistance system recognizing names and addresses continuously spoken in Japanese. This system was evaluated from the human-machine-interface point of view. 2. S P E E C H R E C O G N I T I O N A L G O R I T H M 2.1. Two-Stage LR Pa r se r Figure 1 shows the structure of our continuous speech recognition system for telephone directory assistance. We have developed a two-stage LR parser that uses two classes of LR tables: "a main grammar table and several sub-grammar tables. These grammar tables are separately compiled from a context-free grammar. The sub-grammar tables deal with semantically classified items, such as city names, town names, block numbers, and subscriber names. The main grammar table controls the relationships between these semantic items.
منابع مشابه
An interactive directory assistance service for Spanish with large-vocabulary recognition
In the EU funded IDAS project (LE4-8315), demonstrators providing an automated interactive telephone-based directory assistance service have been developed by ten partners from Germany, Greece, Spain and Switzerland [6]. In this paper we will focus in the Spanish demonstrator. In particular, we will describe the following aspects: The general architecture of the system, paying special attention...
متن کاملAn efficient search method for large-vocabulary continuous-speech recognition
This paper proposes an efficient method for largevocabulary continuous-speech recognition, using a compact data structure and an efficient search algorithm. We introduce a very compact data structure DAWG as a lexicon to reduce the search space. We also propose a search algorithm to obtain the N-best hypotheses using the DAWG structure. This search algorithm is composed of two phases: “forward ...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملIssues in speech recognition applied to directory listing retrieval
This paper addresses several issues relevant to the application of speech recognition in directory listing retrieval: the very large dimension of the vocabularity, the confusability between vocabulary words and the powerful syntactic models implicit in full names. These issues will be addressed using as a case study the automation of the directory assistance of the two largest cities in Portugal.
متن کاملCodebook Dependent Dynami for Mandarin Speech Recogn
Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the variational mismatch caused by different telephone ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994